Use of morphological analysis in protein name recognition
نویسندگان
چکیده
Protein name recognition aims to detect each and every protein names appearing in a PubMed abstract. The task is not simple, as the graphic word boundary (space separator) assumed in conventional preprocessing does not necessarily coincide with the protein name boundary. Such boundary disagreement caused by tokenization ambiguity has usually been ignored in conventional preprocessing of general English. In this paper, we argue that boundary disagreement poses serious limitations in biomedical English text processing, not to mention protein name recognition. Our key idea for dealing with the boundary disagreement is to apply techniques used in Japanese morphological analysis where there are no word boundaries. Having evaluated the proposed method with GENIA corpus 3.02, we obtain F-measure of 69.01 on a strict criterion and 79.32 on a relaxed criterion. The result is comparable to other published work in protein name recognition, without resorting to manually prepared ad hoc feature engineering. Further, compared to the conventional preprocessing, the use of morphological analysis as preprocessing improves the performance of protein name recognition and reduces the execution time.
منابع مشابه
A Novel Approach to Conditional Random Field-based Named Entity Recognition using Persian Specific Features
Named Entity Recognition is an information extraction technique that identifies name entities in a text. Three popular methods have been conventionally used namely: rule-based, machine-learning-based and hybrid of them to extract named entities from a text. Machine-learning-based methods have good performance in the Persian language if they are trained with good features. To get good performanc...
متن کاملThe Combinational Use Of Knowledge-Based Methods and Morphological Image Processing in Color Image Face Detection
The human facial recognition is the base for all facial processing systems. In this work a basicmethod is presented for the reduction of detection time in fixed image with different color levels.The proposed method is the simplest approach in face spatial localization, since it doesn’trequire the dynamics of images and information of the color of skin in image background. Inaddition, to do face...
متن کاملTAGARAB: A Fast, Accurate Arabic Name Recognizer Using High-Precision Morphological Analysis
We describe a fast, high-performance name recognizer for Arabic texts. It combines a patternmatching engine and supporting data with a morphological analysis component. The role of the morphological analysis in accurate name recognition is discussed. VCe also provide evaluations of both morphological analysis and name recognition.
متن کاملStudy on the genus Phyllosticta (Ascomycota: Phyllostictaceae) from Guilan province (N. Iran)
Species of Phyllosticta are important group of pathogenic and endophytic fungi that are reported from various crop plants, ornamental and non-fruiting plants. These fungi cause several diseases such as leaf and fruit spots and fruit rot. In order to identify Phyllosticta species in Guilan province, several collections were examined around the region. Altogether 14 taxa were isolated from differ...
متن کاملEvaluation of Genetic Diversity in Iranian Violet (Viola spp) Populations Using Morphological and RAPD Molecular Markers
Recognition of genetic reserves and desirable genes is the basis of breeding programs. So far, in Iran, due to the lack of recognition of genetic resources, a considerable breeding program has not been done on native plants. The study of the genetic diversity of violets as a native plant with ornamental and medicinal uses is the great importance in advancing the breeding goals of this plant. So...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Journal of biomedical informatics
دوره 37 6 شماره
صفحات -
تاریخ انتشار 2004